Segment Choice Models: Feature-Rich Models for Global Distortion in Statistical Machine Translation
نویسندگان
چکیده
This paper presents a new approach to distortion (phrase reordering) in phrasebased machine translation (MT). Distortion is modeled as a sequence of choices during translation. The approach yields trainable, probabilistic distortion models that are global: they assign a probability to each possible phrase reordering. These “segment choice” models (SCMs) can be trained on “segment-aligned” sentence pairs; they can be applied during decoding or rescoring. The approach yields a metric called “distortion perplexity” (“disperp”) for comparing SCMs offline on test data, analogous to perplexity for language models. A decision-tree-based SCM is tested on Chinese-to-English translation, and outperforms a baseline distortion penalty approach at the 99% confidence level.
منابع مشابه
PORTAGE: with Smoothed Phrase Tables and Segment Choice Models
Improvements to Portage and its participation in the shared task of NAACL 2006 Workshop on Statistical Machine Translation are described. Promising ideas in phrase table smoothing and global distortion using feature-rich models are discussed as well as numerous improvements in the software base.
متن کاملInductive Detection of Language Features via Clustering Minimal Pairs: Toward Feature-Rich Grammars in Machine Translation
Syntax-based Machine Translation systems have recently become a focus of research with much hope that they will outperform traditional Phrase-Based Statistical Machine Translation (PBSMT). Toward this goal, we present a method for analyzing the morphosyntactic content of language from an Elicitation Corpus such as the one available in the LDC’s LCTL language packs. The presented method discover...
متن کاملFeature-Rich Phrase-based Translation: Stanford University's Submission to the WMT 2013 Translation Task
We describe the Stanford University NLP Group submission to the 2013 Workshop on Statistical Machine Translation Shared Task. We demonstrate the effectiveness of a new adaptive, online tuning algorithm that scales to large feature and tuning sets. For both English-French and English-German, the algorithm produces feature-rich models that improve over a dense baseline and compare favorably to mo...
متن کاملDiscriminative Feature-Rich Modeling for Syntax-Based Machine Translation
State-of-the-art statistical machine translation systems are most frequently built on phrasebased (Koehn et al., 2003) or hierarchical translation models (Chiang, 2005). In addition, a wide variety of models exploiting syntactic annotation on either the source or target side (or both) have recently been developed and also give state-of-the-art performance (Galley et al., 2006; Zollmann and Venu...
متن کاملDynamic distortion in a discriminative reordering model for statistical machine translation
Most phrase-based statistical machine translation systems use a so-called distortion limit to keep the size of the search space manageable. In addition, a distance-based distortion penalty is used as a feature to keep the decoder to translate monotonically unless there is sufficient support for a jump from other features, particularly the language models. To overcome the issue of setting the op...
متن کامل